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O ; Abstract 

^ The notion of vertex sparsification (in particular cut-sparsification) is introduced in fTF], where it was shown 
I— J that for any graph G = iV,E) and a subset of k terminals K czV, there is a polynomial time algorithm to construct a 
graph H = {K,Eh) on just the terminal set so that simultaneously for all cuts {A,K-A), the value of the minimum 
CN cut in G separating A from - A is approximately the same as the value of the corresponding cut in H. Then 
approximation algorithms can be run directly on // as a proxy for running on G, yielding approximation guarantees 
C/^ independent of the size of the graph. In this work, we consider how well cuts in the sparsifier H can approximate 
Q the minimum cuts in G, and whether algorithms that use such reductions need to incur a multiplicative penalty in 
\ the approximation guarantee depending on the quality of the sparsifier. 

We give the first super-constant lower bounds for how well a cut-sparsifier H can simultaneously approximate 
all minimum cuts in G. We prove a lower bound of Q(log'^^/:) - this is polynomially -related to the known upper 
bound of C?(log/:/loglog/c). This is an exponential improvement on the ^(loglog/:) bound given in ifTSi which in 
\C> fact was for a stronger vertex sparsification guarantee, and did not apply to cut sparsifiers. 
^ Despite this negative result, we show that for many natural problems, we do not need to incur a multiplicative 
^ penalty for our reduction. Roughly, we show that any rounding algorithm which also works for the 0-extension 
^ , relaxation can be used to construct good vertex-sparsifiers for which the optimization problem is easy. Using 
Q ' this, we obtain optimal C?(log A:)-competitive Steiner oblivious routing schemes, which generalize the results in 
O . ||2T1 . We also demonstrate that for a wide range of graph packing problems (which includes maximum concurrent 
flow, maximum multiflow and multicast routing, among others, as a special case), the integrality gap of the linear 
program is always at most 0(\ogk) times the integrality gap restricted to trees. This result helps to explain the 
^ ■ ubiquity of the 0(\ogk) guarantees for such problems. Lastly, we use our ideas to give an efficient construction for 
vertex-sparsifiers that match the current best existential results - this was previously open. Our algorithm makes 
novel use of Earth-mover constraints. 
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1 Introduction 



1.1 Background 

The notion of vertex sparsification (in particular cut-sparsification) is introduced in lITSl : Given a graph G = 
(V, E) and a subset of terminals K czV, the goal is to construct a graph H = (K, Eh) on just the terminal set so that 
simultaneously for all cuts {A,K-A), the value of the minimum cut in G separating A from K-Ai?, approximately 
the same as the value of the corresponding cut in H. If for all cuts {A,K-A), the the value of the cut in H is at least 
the value of the corresponding minimum cut in G and is at most a times this value, then we call H a cut-sparsifier 
of quality a. 

The motivation for considering such questions is in obtaining approximation algorithms with guarantees that 
are independent of the size of the graph. For many graph partitioning and multicommodity flow questions, the 
value of the optimum solution can be approximated given just the values of the minimum cut separating A from 
K-AinG (for every A(Z K). As a result the value of the optimum solution is approximately preserved, when 
mapping the optimization problem to H. So approximation algorithms can be run on // as a proxy for running 
directly on G, and because the size (number of nodes) of H is |^|, any approximation algorithm that achieves a 
^©/^(loglVD-approximation guarantee in general will achieve a poly(\og\K\) approximation guarantee when run 
on H (provided that the quality a is also poly(}og \K\)). Feasible solutions in H can also be mapped back to feasible 
solutions in G for many of these problems, so polynomial time constructions for good cut-sparsifiers yield black 
box techniques for designing approximation algorithms with guarantees poly(\og \K\) (and independent of the size 
of the graph). 

In addition to being useful for designing approximation algorithms with improved guarantees, the notion of 
cut-sparsification is also a natural generalization of many methods in combinatorial optimization that attempt to 
preserve certain cuts in G (as opposed to all minimum cuts) in a smaller graph H - for example Gomory-Hu Trees, 
and Mader's Theorem. Here we consider a number of questions related to cut-sparsification: 

1. Is there a super-constant lower bound on the quality of cut-sparsifiers? Do the best (or even near-best) 
cut-sparsifiers necessarily result from (a distribution on) contractions? 

2. Do we really need to pay a price (in the approximation guarantee) when applying vertex sparsification to an 
optimization problem? 

3. Can we construct (in polynomial time) cut-sparsifiers with quality as good as the current best existential 
results? 

We resolve all of these questions in this paper. In the preceding subsections, we will describe what is currently 
known about each of these questions, our results, and our techniques. 

1.2 Super- Constant Low^er Bounds and Separations 

In ifTSl . it is proven that in general there are always cut-sparsifiers H of quality at most 0(logfc/loglogfc). In 
fact, if G excludes any fixed minor then this bound improves to 0(1). Yet prior to this work, no super-constant 
lower bound was known for the quality of cut-sparsifiers in general. We prove 

Theorem 1. There is an infinite family of graphs that admits no cut-sparsifiers of quality better than il(\og^^^ k). 

' Recently, it has come to our attention tiiat, independent of and concurrent to our work, Makarycliev and Makarycliev, and indepen- 
dently, Englert, Gupta, Krauthgamer, Raecke, Talgam and Talwar obtained results similar to some in this paper. 



Some results are known in more general settings. In particular, one could require that the graph H not only 
approximately preserve minimum cuts but also approximately preserve the congestion of all multicommodity flows 
(with demands endpoints restricted to be in the terminal set). This notion of vertex-sparsification is referred to as 
flow-sparsification (see [15]) and admits a similar definition of quality. [15] gives a lower bound of n(loglogfc) 
for the quality of flow-sparsifiers. However, this does not apply to cut sparsifiers and in fact, for the example given 
in ifTSl . there is an C?(l)-quality cut-sparsifier! 

Additionally, there are examples in which cuts can be preserved within a constant factor, yet flows cannot: 
Benczur and Karger L3J proved that given any graph on n nodes, there is a sparse (weighted) graph G' that approx- 
imate all cuts in G within a multiplicative ( 1 + e) factor, but one provably cannot preserve the congestion of all 
multicommodity flows within a factor better than Q( , ) on a sparse graph (consider the complete graph Kn). 
So here the limits of sparsification are much diff"erent for cuts than for flows. 

In this paper, we give a super-constant lower bound on the quality of cut-sparsifiers in general and in fact 
this implies a stronger lower bound than is given in [15]. Our bound is polynomially related to the current best 
upper-bound, which is 0(logA:/loglog/:). 

We note that the current best upper bound is actually a reduction from the upper bound on the integrality gap 
of a particular LP relaxation for the 0-extension problem 0, E|. The integrality gap of this LP relaxation is 
known to be Q.{ ^J\ogk). Yet, the best lower bound we are able to obtain here is Q(log'^^^). This leads us to our 
next question: Do integrality gaps for the 0-extension LP immediately imply lower bounds for cut-sparsification? 
This question, as we will see, is essentially equivalent to the question of whether or not the best cut-sparsifiers 
necessarily come from a distribution on contractions. 

Lower bounds on the quality of cut-sparsifiers (in this paper) and flow-sparsifiers ([15]) are substantially more 
complicated than integrality gap examples for the 0-extension LP relaxation. If the best cut-sparsifiers or flow- 
sparsifiers were actually always generated from some distribution on contractions in the original graph via strong 
duality (see Section 3), any integrality gap would immediately imply a lower bound for cut-sparsificatin or flow- 
sparsification. But as we demonstrate here, this is not the case: 

Theorem 2. There is an infinite family of graphs so that the ratio of the best quality cut-sparsifier to the best 
quality cut-sparsifier that can be achieved through a distribution on contractions is o(l) = 

We also note that in order to prove this result we establish a somewhat surprising connection between cut- 
sparsification and the harmonic analysis of Boolean functions. The particular cut-sparsifier that we construct in 
order to prove this result is inspired by the noise stability operator, and as a result, we can use tools from harmonic 
analysis (Bourgain's Junta Theorem [T] and the Hypercontractive Inequality [T], [T|) to analyze the quality of the 
cut-sparsifier. Casting this question of bounding the quality as a question in harmonic analysis allows us to reason 
about many cuts simultaneously without worrying about the messy details of the combinatorics. 

1.3 Abstract Integrality Gaps and Rounding Algorithms 

As described earlier, running an approximation algorithm on the sparsifier H = {K,Eh) as a proxy for the graph 
G = (V,E) pays an additional price in the approximation guarantee that corresponds to how well H approximates 
G. Here we consider the question of whether this loss can be avoided. 

As a motivating example, consider the problem of Steiner oblivious routing LIS I- Previous techniques for 
constructing Steiner oblivious routing schemes ifTSl . ifTSl first construct a flow-sparsifier H for G, construct an 
oblivious routing scheme in H and then map this back to a Steiner oblivious routing scheme in G. Any such 
approach must pay a price in the competitive ratio, and cannot achieve an 0(log^)-competitive guarantee because 
(for example) expanders do not admit constant factor flow-sparsifiers lITSl . 

So black box reductions pay a price in the competitive ratio, yet here we present a technique for combining 
the flow-sparsification techniques in |[I5]| and the oblivious routing constructions in |I2T1 into a single step, and 



we prove that there are C?(logfc)-competitive Steiner obUvious routing schemes, which is optimal. This resuh is a 
corollary of a more general idea: 

The constructions of flow-sparsifiers given in 1, 15.1 (which is an extension of the techniques in fTE\) can be 
regarded as a dual to the rounding algorithm in lH for the 0-extension problem. What we observe here is: Suppose 
we are given a rounding algorithm that is used to round the fractional solution of some relaxation to an integral 
solution for some optimization problem. If this rounding algorithm also works for the relaxation for the 0-extension 
problem given in 1 12] (and also used in |6|, |8|), then we can use the techniques in ifTSl . ifTSll to obtain stronger 
flow-sparsifiers which are not only good quality flow-sparsifiers, but also for which the optimization problem is 
easy. So in this way we do not need to pay an additional price in the approximation guarantee in order to replace 
the dependence on n with a dependence on k. With these ideas in mind, what we observe is that the rounding 
algorithm in |9| wh ich embed s metric spaces into distributions on dominating tree-metrics, can also be used to 
round the 0-extension relaxation. This allows us to construct flow-sparsifiers that have C?(log /:)-quality, and also 
can be explicitly written as a convex combination of 0-extensions that are tree-like. On trees, oblivious routing 
is easy, and so this gives us a way to simultaneously construct good flow-sparsifiers and good obUvious routing 
schemes on the sparsifier in one step ! 

Of course, the rounding algorithm in [9| for embedding metric spaces into distributions on dominating tree- 
metrics is a very common first step in rounding fractional relaxations of graph partitioning, graph layout and 
clustering problems. So for all problems that use this embedding as the main step, we are able to replace the 
dependence on n with dependence on k, and we do not introduce any additional poly-logarithmic factors as in 
previous work! One can also interpret our result as giving a generalization of the hierarchical decompositions 
given in tTDi for approximating the cuts in a graph G on trees. We state our results more formally, below, and we 
refer to such a statement as an Abstract Integrality Gap. 

Definition 1. We call a fractional packing problem P a graph packing problem if the goal of the dual covering 
problem D is to minimize the ratio of the total units of distance X capacity allocated in the graph divided by some 
monotone increasing function of the distances between terminals. 

This definition is quite general, and captures maximum concurrent flow, maximum multiflow, and multicast 
routing as special cases, in addition to many other common optimization problems. The integral' dual ID problems 
are generalized sparsest cut, multicut and requirement cut respectively. 

Tlieorem 3. For any graph packing problem P, the maximum ratio of the integral dual to the fractional primal is 
at most C?(log k) times the maximum ratio restricted to trees. 

For a packing problem that fits into this class, this theorem allows us to reduce bounding the integrality gap in 
general graphs to bounding the integrality gap on trees, which is often substantially easier than for general graphs 
(i.e. for the example problems given above). We believe that this result helps to explain the intrinsic robustness of 
fractional packing problems into undirected graphs, in particular the ubiquity of the C?(log^) bound for the flow-cut 
gap for a wide range of multicommodity flow problems. 

We also give a polynomial time algorithm to reduce any graph packing problem P to a corresponding problem 
on a tree: Again, let K be the set of terminals. 

Definition 2. Let OPT{P,G) be the optimal value of the fractional graph packing problem P on the graph G. 

Theorem 4. There is a polynomial time algorithm to construct a distribution /uon(a polynomial number of) trees 
on the terminal set K, s.t. 

ET^f,[OPTiPT)] < 0{\ogk)OPT{P,G) 

'The notion of what constitutes an integral solution depends on the problem. In some cases, it translates to the distances are all or 1, 
and in other cases it can mean something else. The important point is that the notion of integral just defines a class of admissible metrics, 
as opposed to arbitrary metrics which can arise in the packing problem. 



and such, that any valid integral dual of cost C (for any tree T in the support of p.) can be immediately transformed 
into a valid integral dual in G of cost at most C. 

As a corollary, given an approximation algorithm that achieves an approximation ratio of C for the integral dual 
to a graph packing problem on trees, we obtain an approximation algorithm with a guarantee of C?(Clog^) for 
general graphs. We will refer to this last result as an Abstract Rounding Algorithm. 

We also give a polynomial time construction of C?(log^/loglog^) quality flow-sparsifiers (and consequently 
cut-sparsifiers as well), which were previously only known to exist, but finding a polynomial time construction 
was still open. We accomplish this by performing a lifting (inspired by Earth-mover constraints) on an appropriate 
linear program. This lifting allows us to implicitly enforce a constraint that previously was difficult to enforce, and 
required an approximate separation oracle rather than an exact separation oracle. We give the details in section [5l 

2 Maximum Concurrent Flow 

An instance of the maximum concurrent flow problem consists of an undirected graph G - {V,E), a capacity 
function c : E ^ 51"*" that assigns a non-negative capacity to each edge, and a set of demands K^;, ?;,/])} where 
SjJi € V and f is a non-negative demand. We denote K = Ui[si,tj}. The maximum concurrent flow question 
asks, given such an instance, what is the largest fraction of the demand that can be simultaneously satisfied? This 
problem can be formulated as a polynomial-sized linear program, and hence can be solved in polynomial time. 
However, a more natural formulation of the maximum concurrent flow problem can be written using an exponential 
number of variables. 

For any a,b eV let Pa,b be the set of all (simple) paths from a to in G. Then the maximum concurrent flow 
problem and the corresponding dual can be written as : 

max A min lie<i(e)c(e) 

s.t. s.t. 

SpeP,,,, <P) > ^fi "^PeP,,,, ILeeP d(e) > D{Si, f,) 

I.P^e^lP)<c{e) I.iD(,Si,ti)f>l 

x{P)>0 die)>0,Disi,ti)>0 
For a maximum concurrent flow problem, let A* denote the optimum. 

Let |i^| = k. Then for a given set of demands {s,-, we associate a vector / e in which each coordinate 
corresponds to a pair {x,y) € and the value /v,y is defined as the demand f for the terminal pair s, = x,ti = y. 

Definition 3. We denote congcif) = jr 

Or equivalently congdf) is the minimum C s.t. / can be routed in G and the total flow on any edge is at most 
C times the capacity of the edge. 

Throughout we will use the notation that graphs G\,G2 (on the same node set) are "summed" by taking the 
union of their edge set (and allowing parallel edges). 

2.1 Cut Sparsifiers 

Suppose we are given an undirected, capacitated graph G = iV,E) and a set ^ c V of terminals of size k. Let 
h:2^ ^ 51"^ denote the cut function of G: h{A) = Y,{u,v)eE s.t. ueA,vev-A c("> v). We define the function hK-2^^ 51+ 
which we refer to as the terminal cut function on K: hxiU) - min^cv s.t. AnK=uhW- 

Definition 4. G' is a cut-sparsifier /or the graph G - {V,E) and the terminal set K if G' is a graph on just the 
terminal set K (i.e. G' = (K,E')) and if the cut function h' ^ %^ of G' satisfies (for all U cz K) 

hKiU) < h'iU) 



We can define a notion of quaUty for any particular cut-sparsifier: 



Definition 5. The quaUty of a cut-sparsifier G' is defined as 



maxijcK 



hK(U) 



We will abuse notation and define |j = 1 so that when U is disconnected from K - U in G or ii U = Q) or 
U = K, the ratio of the two cut functions is 1 and we ignore these cases when computing the worst-case ratio and 
consequently the quality of a cut-sparsifier. 

2.2 0-Extensions 

Definition 6. f : V ^ K is a 0-extension if for all ae K, f{a) - a. 

So a 0-extension / is a clustering of the nodes in V into sets, with the property that each set contains exactly 
one terminal. 

Definition 7. Given a graph G = {V,E) and a set K c V, and 0-extension f,Gf- {K,Ef) is a capacitated graph 
in which for all a,b € K, the capacity Cf{a,b) of edge {a,b) e Ef is 



3 Lower Bounds for Cut Sparsifiers 

Consider the following construction for a graph G. Let Y be the hypercube of size 2'^' for d = logk. Then for 
every node € Y (i.e. s € {0, 1)^), we add a terminal Zs and connect the terminal Zs to y^ using an edge of capacity 
^^d. All the ed ges in the hypercube are given capacity 1. We'll use this instance to show 2 lower bounds, one for 
0-extension cut sparsifiers and the other for arbitrary cut sparisifers. 

3.1 Lower bound for Cut Sparsifiers from 0-extensions 

In this subseciton, we give an Q.{ V^) integrality gap for the semi-metric relaxation of the 0-extension problem 
on this graph, even when the semi-metric (actually on all of V) is £i. Such a bound is actually implicit in the work 
of ifTTIl too. Also , we show a strong duality between the worst case integrality gap for the semi-metric relaxation 
(when the semi-metric on V must be fi) and the quality of the best cut-sparsifer that can result from contractions. 
This gives an Q( yjlog k) lower bound on how well a distribution on 0-extensions can approximate the minimum 



Also, given the graph G - {V,E) a set ^ c V of terminals, and a semi-metric D on A' we define the 0-extension 
problem as: 

Definition 8. The 0-Extension Problem is defined as 




{u,v)eE S.t. f(u)=aj{v)=b 



cuts in G. 



mm 

0-Extensionsf 




(u,v)eE 



We denote OPT{G,K,D) as the value of this optimum. 



Definition 9. Let Ajj denote the cut-metric in which Aj/(m,v) - l\uniu,v]\=i- 



Also, given an partition !P of V, we will refer to Ap as the partition metric (induced by P) which is 1 if m and v 
are contained in different subsets of the partition P, and is otherwise. 

min Z(„,„)e£c(M,v)(5(M,v) 
s.t. 

5 is a semi-metric on V 

y,,feKSit,n = Dit,n. 

We refer to this linear program as the Semi-Metric Relaxation. For a particular instance {G,K,D) of the 
0-extension problem, we denote the optimal solution to this linear program as OPTsmiG,K,D). 

Theorems. Ml/ 

OPTUG,K,D) < OPT < 0( ^°f'' )OPTUG,K,D) 

log log ^ 

If we ai^e given a semi-metric D which is £i , we can additionally define a stronger (exponentially) sized linear 
program. 

min Y.u5{U)h{U) 
s.t. 

'^UfeKl.u5m^u{Ut') = D{t,t'). 

We will refer to this linear program as the Cut-Cut Relaxation. For a particular instance {G,K,D) of the 
0-extension problem, we denote the optimal solution to this linear program as OPTcc{G,K,D). 

The value of this linear program is that an upper bound on the integrality gap of this linear program (for a 
particular graph G and a set of terminals K) gives an upper bound on the quality of cut-sparsifiers. In fact, a 
stronger statement is true, and the quality of the best cut-sparsifier that can be achieved through contractions will 
be exactly equal to the maximum integrality gap of this linear program. The upper bound is given in ITSll -and 
here we exhibit a strong duality: 

Definition 10. The Contraction Quality of G,^ is defined to be the minimum a such that there is a distribution on 
0-extensions y and H -Yjf y{f)G j is a o' quality cut-sparsifier. 

Lemma 1. Let v be the maximum integrality gap of the Cut-Cut Relaxation for a particular graph G - {V,E), a 
particular set K cV of terminals, over all i\ semi-metrics D on K. Then the Contraction Quality ofG,Kis exactly 

V. 

Proof: Let a be the Contraction Quality of G,K. Then implicitly in fTF], a <v. Suppose y is a distribution on 
0-extensions s.t. H = Tjfyif)Gf is a a-quality cut sparsifier. Given any semi-metric D on K, we can solve 
the Cut-Cut Linear Program given above. Notice that cut (U, V -U) that is assigned positive weight in an optimal 
solution must be the minimum cut separating UoK = A from K-A = {V -11)0 K in G. If not, we could replace this 
cut {U,V-U) with the minimum cut separating A from K-A without affecting the feasibility and simultaneously 
reducing the cost of the solution. So for all U for which d{U) > 0, h{U) = hfciU n K). 

Consider then the cost of the semi-metric D against the cut-sparsifier H which is defined to be 
Yi(a,h)CH{a,b)D{a,b) = Yify{f)Yi(a,h)C f{a,b)D{a,b) which is just the average cost of D against Gf where / is 
sampled from the distribution y. The Cut-Cut Linear Program gives a decomposition of D into a weighted sum of 
cut-metrics - i.e. D{a,b) - YjV 5{U)Ajj{a,b). Also, the cost of D against H is linear in D so this implies that 



Y^CH{a,b)D{a,b) = Y^Y^CH{a,b)5{U)Au{a,b) = Y^CH{a,b)6{U)h\Uf^K) 

ia,b) (a,b) U (a,h) 



In the last line, we use 'Zi(a,b) b)Au(a, b) - h'(U r\K). Then 



Y_^CH{a,b)D{a,b) < J^6iU)ahK{U n K) = aOPTcc{G,K,D) 

(a,b) U 

In the inequality, we have used the fact that H is an a-quality cut-sparsifier, and in the last line we have used 
that 6(U)>0 impUes that h(U) - hK(U n K). This completes the proof because the average cost of D against G/ 
where / is sampled from 7 is at most aOPTcc(G,K,D), so there must be some / s.t. the cost against D is at most 
aOPTcc(G,K,D). □ 

We will use this strong duality between the Cut-Cut Relaxation and the Contraction Quality to show that for the 
graph G given above, no distribution on 0-extensions gives better than an Q( ^J\ogk) quality cut-sparsifier, and all 
we need to accomplish this is to demonstrate an integrality gap on the example for the Cut-Cut Relaxation. 

Let's repeat the construction of G here. Let Y be the hypercube of size 2"^ for d - log A:. Then for every node 
y^eY (i.e. s e {0, 1}''), we add a terminal Zs and connect the terminal Zs to using an edge of capacity V^. All 
the edges in the hypercube are given capacity 1. 

Then consider the distance assignment to the edges: Each edge connecting a terminal to a node in the hypercube 
- i.e. an edge of the form (zs,ys) is assigned distance and every other edge in the graph is assigned distance 1. 
Then let cr be the shortest path metric on V given these edge distances. 

Claim 1. cr is an €1 semi-metric on V, and in fact there is a weighted combination of cuts s.t. cr{u,v) - 
Zu^(U)Auiu,v) andZu^(U)h(U) = 0(kd) 

Proof: We can take d{U) = 1 for any cut {U,V-U) s.t. U - {ZjU = 1) - i.e. Uis the axis-cut corresponding to 
the i''' bit. We also take 6(U) - for each U - [zs}- This set of weights will achieve o-(u,v) - Y,u^(^)^u(u,v), 
and also there are d axis cuts each of which has capacity h(U) - j and there are k singleton cuts of weight and 
capacity V5 so the total cost is 0{kd). 

□ 

Yet if we take D equal to the restriction of a on K, then OPT{G,K,D) = Q.{kd^l^): 
Lemma 2. OPT{G,K,D)^Q.{kd^l'^) 

Proof: Consider any 0-extension /. And we can define the weight of any terminal a as weightf{a) = \f~^{a)\ - 
\{v\f(v) - a}\. Then = n because each node in y is assigned to some terminal. We can define 

a terminal as heavy with respect to / if weight f (a) > sfk and light otherwise. Obviously, Yua'^^^sht fia) - 
s t a is Yi^t'^^^shtfi.a) + ^ t « is heavy ^^'^'^^Z^'*) ^'^^ sizQ?, of either all heavy terminals or 

of all light terminals is at least | = Q.{k). 

Suppose that § t a is light ^^'^'^^/*^^) ~ n(/c). For any pair of terminals a,b, D{a,b) > VS. Also for any 
light terminal a, f'^{a) - {a} is a subset of the Hypercube of at most nodes, and the small-set expansion of the 
Hypercube impUes that the number of edges out of this set is at least 0.(weightf(a)\ogk) - Q.{w eight f{a)d). Each 
such edge pays at least ^^d cost, because D(a,b) > for all pairs of terminals. So this impUes that the total cost 
of the 0-extension / is at least T^a st « is light ^{weight f{a)d^^^). 

Suppose that Yja^i a is heavy weight f{a) = Q.(k). Consider any heavy terminal Zt, and consider any ys e f^^{zt) 
and ti^ s. Then the edge (ys^Zs) is capacity yfd and pays a total distance of D{zt,Zs) > o-(yt,ys). Consider any set U 
of nodes in the Hypercube. If we attempt to pack these nodes so as to minimize Tiy,€u(^(ys'yt) for some fixed 
node yt, then the packing that minimizes the quantity is an appropriately sized Hamming ball centered at yt. In a 
Hamming ball centered at the node yt of at least total nodes, the average distance from is Q(logA:) = Q.id), 



and so this implies that llyj€/-i(z,)^fe'^.s) - Hys€/-i(z,)^Cy?'}'i) ^ ^(weight f(zt)d). Each such edge has capacity 
so the total cost of the 0-extension / is at least Ti^st a is heavy Omega{weightf{a)(f'l^) □ 

And of course using our strong duaUty result, this integrality gap implies that any cut-sparsifier that results from 
a distribution on 0-extensions has quality at least Q.{ ^Jlogk), and this matches the current best lower bound on the 
integrality gap of the Semi-Metric Relaxation for 0-extension, so in principle this could be the best lower bound 
we could hope for (if the integrality gap of the Semi-Metric Relaxation is in fact 0{ ^Jlog k) then there are always 
cut-sparsifiers that results from a distribution on 0-extensions that are quality at most 0( ^Jlogk)). 

3.2 Lower bounds for Arbitrary Cut sparsifiers 

We will in fact use the above example to give a lower bound on the quaUty of any cut-sparisifer. We will show 
that for the above graph, no cut-sparsifier achieves quality better than Q(log^^'*^), and this gives an exponential 
improvement over the previous lower bound on the quahty of flow-sparsifiers (which is even a stronger requirement 
for sparsifiers, and hence a weaker lower bound). 

The particular example G that we gave above has many symmetries, and we can use these symmetries to justify 
considering only symmetric cut-sparsifiers. The fact that these cut-sparsifiers can be assumed without loss of 
generality to have nice symmetry properties, translates to that any such cut-sparsifier H is characterized by a much 
smaller set of variables rather than one variable for every pair of terminals. In fact, we will be able to reduce 
the number of variables from to log A:. This in turn will allow us to consider a much smaller family of cuts 
in G in order to derive that the system is infeasible. In fact, we will only consider sub-cube cuts (cuts in which 
U = {Zs^Jsls = [0,0,0, ....0, *, *, *]}) and the Hamming ball U = {Zs^ys\d(ys,yo) < f }• 

Definition 11. The operation Jsfor some s e {0, 1}'' which is defined as Jsijt) — Jt+s mod 2 and Jsizt) — Zt+s modi- 
Also let Js(U) - UueuJsiu)- 

Definition 12. For any permutation ;r : [J] ^ [</], ;r(5') = Vs„(\),s,^(2),...Sj^(^i)\. Then the operation Jnfor any per- 
mutation n is defined at J^iyt) - ynit) and T„{zt) - Zn{ty Also let Jn{U) - U„^i/T„{u). 

Claim 2. For any subset U <zV and any s e {0, 1}'', h{U) - h{Js{U)). 

Claim 3. For any subset U <zV and any permutation n:[d]^ [d], h(U) - h(J„(U)). 

Both of these operations are automorphisms of the weighted graph G and also send the set K to K. 

Lemma 3. If there is a cut-sparsifier H for G which has quality a, then there is a cut-sparsifier H' which has 
quality at most a and is invariant under the automorphisms of the weighted graph G that send K to K. 

Proof: Given the cut-sparsifier H, we can apply an automorphism J to G, and because h{U) = h(J(U)), this implies 
that hK{A) = miuy s.t. unK=A KU) = min^ s.t. unK=A KJiU)). Also J{U r\K) = J{U)r\ J{K) = J{U)r\K so we can 
re-write this last Une as 

min h(J(U))^ min h(J(U')) 

u s.t. UnK=A U' S.t. J{U')nK=JiA) 

And if we set U' = J^^{U) then this last line becomes equivalent to 

min h{J{U'))^ min h{U) ^hxiJiA)) 
U' s.t. J(U')nK=J(A) U S.t. UnK=J(A) 

So the result is that hxiA) - hxiJiA)) and this implies that if we do not re-label H according to J, but we do 
re-label G, then for any subset A, we are checking whether the minimum cut in G re-labeled according to J, that 
separates A from K-Ais close to the cut in H that separates A from K-A. The minimum cut in the re-labeled G 



that separates A from K -A, is just the minimum cut in G that separates J (A) from K - J (A) (because the set 
7"^(A) is the set that is mapped to A under /). So H is an a-quality cut-sparsifier for the re-Iabeled G iff for all A: 

hK{A) - hK{r\A)) < h\A) < ahK{r\A)) = ahK{A) 

which is of course true because H is an a-quality cut-sparsifier for G. 

So alternatively, we could have applied the automorphism J^^ to H and not re-labeled G, and this resulting graph 
Hj-\ would also be an a-quality cut-sparsifier for G. Also, since the set of a-quality cut-sparsifiers is convex (it is 
defined by a system of inequalities), we can find a cut-sparsifier H' that has quality at most a and is a fixed point 
of the group of automorphisms, and hence invariant under the automorphisms of G as desired. □ 

Corollary 1. If a is the best quality cut-sparsifier for the above graph G, then there is an a quality cut-sparsifier 
H in which the capacity between two terminals Zs and Zt is only dependent on the Hamming distance Hamm{s,t). 

Proof: Given any quadruple Zs,Zt and Zs',Zt' s.t. Hamm{s,t) = Hamm{s' ,t'), there is a concatenation of operations 
from Js, J„ that sends s to s' and t \.o t' . This concatenation of operations / is in the group of automorphisms 
that send K to K, and hence we can assume that H is invariant under this operation which implies that cu^s, t) = 

CH{s',t'). □ 

One can regard any cut-sparsifier (not just ones that result from contractions) as a set of variables, one for 
the capacity of each edge in H. Then the constraints that H be an a-quality cut-sparsifier are just a system of 
inequalities, one for each subset A a K that enforces that the cut in H is at least as large as the minimum cut 
in G (i.e. h'{A) > /ja:(A)) and one enforcing that the cut is not too large (i.e. h'(A) < ahKiA)). Then in general, 
one can derive lower bounds on the quality of cut-sparsifiers by showing that if a is not large enough, then this 
system of inequalities is infeasible meaning that there is not cut-sparsifier achieving quality a. Unlike the above 
argument, this form of a lower bound is much stronger and does not assume anything about how the cut-sparsifier 
is generated. 

Theorem dJ For a = Q.{log^^^k), there is no cut-sparsifier H for G which has quahty at most a. 

Proof (sketch): Assume that there is a cut-sparsifier H' of quality at most a. Then using the above corollary, there 
is a cut-sparsifier H of quality at most a in which the weight from a to is only a function of Hamm{a,b). Then 
for each / e [d], we can define a variable w,- as the total weight of edges incident to any terminal of length /. I.e. 

- Yjb S.t. Hamm{a,b)=i'^H{a,b). 

For simplicity, here we will assume that all cuts in the sparsifier H are at most the cost of the corresponding 
minimum cut in G and at least ^ times the corresponding minimum cut. This of course is an identical set of 
constraints that we get from dividing the standard definition that we use in this paper for a-quality cut-sparsifiers 
by a. 

We need to derive a contradiction from the system of inequalities that characterize the set of a-quality 
cut sparsifiers for G. As we noted, we will consider only the sub-cube cuts (cuts in which U = {zs Uj^l^ = 
[0,0,0, ....0, *, *, ...*]}) and the Hamming ball U = {Zs^ysWiys,yo) ^ f )> which we refer to as the Majority Cut. 

Consider the Majority Cut: There are &{k) terminals on each side of the cut, and most terminals have Hamming 
weight close to |. In fact, we can sort the terminals by Hamming weight and each weight level around Hamming 
weight I has roughly a ©(■^) fraction of the terminals. Any terminal of Hamming weight | - V/ has roughly a 
constant fraction of their weight w, crossing the cut in H, because choosing a random terminal Hamming distance 
/ from any such terminal corresponds to flipping / coordinates at random, and throughout this process there are 
almost an equal number of Is and Os so this process is well-approximated by a random walk starting at on 
the integers, which equally likely moves forwards and backwards at each step for / total steps, and asking the 
probability that the walk ends at a negative integer. 



In particular, for any terminal of Hamming weight | - the fraction of the weight w,- that crosses the Majority 
Cut is 0(expl-j). So the total weight of length / edges (i.e. edges connecting two terminals at Hamming distance 
/) cut by the Majority Cut is O{wi\{zs\Hamm{s,0) > f - V^ll) = 0{wi ^ir[d)k because each weight close to the 
boundary of the Majority cut contains roughly a ®{-^) fraction of the terminals. So the total weight of edges 

crossing the Majority Cut in H is 0{kYfi=i wt ^flJd) 

And the total weight crossing the minimum cut in G separating A - [Zs\d(ys,yo) ^ f } from A'-A is &(k V^). And 

because the cuts in H are at least ^ times the corresponding minimum cut in G, this imphes Y,'i=i Wi yfijd > ^(-^) 
Next, we consider the set of sub-cube cuts. For j € [d], let Aj - {Zs\si =0,S2 = 0, ..sj = 0}. Then the minimum 

cut in G separating Ay from K-Aj is 0(|Ay|min(7, V^)), because each node in the Hypercube which has the first 

j coordinates as zero has j edges out of the sub-cube, and when j > yfd, we would instead choose cutting each 

terminal Zs e Ay from the graph directly by cutting the edge {ys,Zs)- 

Also, for any terminal in Ay, the fraction of length i edges that cross the cut is approximately 1 - (1 - = 

0(min(^, 1)). So the constraints that each cut in H be at most the corresponding minimum cut in G give the 

inequalities Xf=i mii^C^' l)^;' ^ 0{mm{j, V^)) 

We refer to the above constraint as Bj. Multiply each Bj constraint by and adding up the constraints yields 
a linear combination of the variables on the left-hand side. The coefficient of any w,- is 

"v^ min(§,l) 

Zj fill - Zj fi/2 

And using the Integration Rule this is Q( ^^)■ 

This implies that the coefficients of the constraint B resulting from adding up times each Bj for each w,- are 
at least as a constant times ffie coefficient of in the Majority Cut Inequality. So we get 

i=l ■' 7=1 ■' i=l i=l 

And we can evaluate the constant Yfz\ j'^^^T^U' V^) -T^l^i j'^^^ + ^jfZ^n J'^'^ using the Integration 

Rule, this evaluates to 0(d^^'^). This implies 0(d^^^) > and in particular this implies a > D.{d^^^). So the quality 
of the best cut-sparsifier for H is at least ^(log^^^A:). □ 

We note that this is the first super-constant lower bound on the quality of cut-sparsifiers. Recent work gives a 
super-constant lower bound on the quality of flow-sparsifiers in an infinite family of expander-Uke graphs. How- 
ever, for this family there are constant-quality cut-sparsifiers. In fact, lower bounds for cut-sparsifiers imply lower 
bounds for flow-sparsifiers, so we are able to improve the lower bound of Q.{log log k) in the previous work for 
flow-sparsifiers by an exponential factor to flClog^^"*/:), and this is the first lower bound that is tight to within a 
polynomial factor of the current best upper bound of O(^^^^j). 

This bound is not as good as the lower bound we obtained earher in the restricted case in which the cut- 
sparsifier is generated as a convex combination of 0-extension graphs Gf. As we will demonstrate, there are 
actually cut-sparsifiers that achieve quality o{ ^J\ogk) for G, and so in general restricting to convex combinations 
of 0-extensions is sub-optimal, and we leave open the possibility that the ideas in this improved bound may result in 
better constructions of cut (or flow)-sparsifiers that are able to beat the current best upper bound on the integrality 
gap of the 0-extension linear program. 



4 Noise Sensitive Cut-Sparsifiers 



In Appendix El we give a brief introduction to the harmonic analysis of Boolean functions, along with formal 
statements that we will use in the proof of our main theorem in this section. 

4.1 A Candidate Cut-Sparsifier 

Here we give a cut-sparsifier H which will achieve quality o( ^/logX) for the graph G given in Section [3l which 
is asymptotically better than the best cut-sparsifier that can be generated from contractions. 

As we noted, we can assume that the weight assigned between a pair of terminals in H, CE{a,b) is only a 
function of the Hamming distance from a to b. In G, the minimum cut separating any singleton terminal {zs} 
from K - {zs\ is just the cut that deletes the edge {Zs,ys)- So the capacity of this cut is ^^d. We want a good 
cut-sparsifier to approximately preserve this cut, so the total capacity incident to any terminal in H will also be 
- i.e. c'({z,)) - V^. 

We distribute this capacity among the other terminals as follows: We sample t ~p s, and allocate an infinitesimal 
fraction of the total weight to the edge {Zs,Zt)- Equivalently, the capacity of the edge connecting Zs and Zt is 
just Pru~pt[u = s] V5. We choose p = 1 - This choice of p corresponds to flipping each bit in t with probability 

0(-^) when generating u from t. We prove that the graph H has cuts at most the corresponding minimum-cut in 
G. 

This cut-sparsifier H has cuts at most the corresponding minimum-cut in G. In fact, a stronger statement is true: 
H can be routed as a flow in G with congestion 0(1). Consider the following explicit routing scheme for H: Route 
the V5 total flow in H out of Zs to the node in G. Now we need to route these flows through the Hypercube in 
a way that does not incur too much congestion on any edge. Our routing scheme for routing the edge from Zs to Zt 
in H from to yt will be symmetric with respect to the edges in the Hypercube: choose a random permutation of 
the bits n :[d] ^ [d], and given u ~p t, fix each bit in the order defined by n. So consider i\ = 7t{1). If tj^ + Ui^ , and 
the flow is currently at the node x, then flip the if bit of x, and continue for 12 = n{2), ij, ...id = 7r{d). 

Each permutation k defines a routing scheme, and we can average over all permutations n and this results in a 
routing scheme that routes ^ in G. 

Claim 4. This routing scheme is symmetric with respect to the automorphisms and J„ ofG defined above. 

Corollary 2. The congestion on any edge in the Hypercube incurred by this routing scheme is the same. 

Lemma 4. The above routing scheme will achieve congestion at most 0(1) for routing H in G. 

Proof: Since the congestion of any edge in the Hypercube under this routing scheme is the same, we can calculate 
the worst case congestion on any edge by calculating the average congestion. Using a symmetry argument, we can 
consider any fixed terminal Zs and calculate the expected increase in average congestion when sampling a random 
permutation n :[d] ^ [d] and routing all the edges out of Zs in H using n. This expected value will be k times the 
average congestion, and hence the worst-case congestion of routing H inG according to the above routing scheme. 

As we noted above, we can define H equivalently as arising from the random process of sampling u ~p t, and 
routing an infinitesimal fraction of the total capacity out of Zt to z„, and repeating until all of the capacity 
is allocated. We can then calculate the the expected increase in average congestion (under a random permutation 
tt) caused by routing the edges out of Zs as the expected increase in average congestion divided by the total fraction 
of the capacity allocated when we choose the target u from u ~p t. In particular, if we allocated a A fraction 
of the capacity, the expected increase in total congestion is just the total capacity that we route multiplied by 
the length of the path. Of course, the length of this path is just the number of bits in which u and t differ, which in 
expectation is 0( V^) by our choice of p. 



So in this procedure, we allocate A yfd total capacity, and the expected increase in total congestion is the total 
capacity routed A times the expected path length 0( V^). We repeat this procedure ^ times, and so the expected 
increase in total congestion caused by routing the edges out of Zt in G is @{d). If we perform this procedure for 
each terminal, the resulting total congestion is &(kd), and because there are y edges in the Hypercube, the average 
congestion is 0(1) which impUes that the worst-case congestion on any edge in the Hypercube is also 0(1), as 
desired. Also, the congestion on any edge {Zs,ys) is 1 because there is a total of capacity out of in H, and this 
is the only flow routed on this edge, which has capacity in G by construction. So the worst-case congestion on 
any edge in the above routing scheme is 0(1). □ 



Corollary 3. For any AcK, h'{A) < 0(l)/j;f (A). 

Proof: Consider any set A <z K. Let U be the minimum cut in G separating A from K-A. Then the total flow 
routed from A to ^-A in ^ is just h'{A), and if this flow can be routed in G with congestion 0(1), this implies that 
the total capacity crossing the cut from U toV -U is at least D.{l)h'{A). And of course the total capacity crossing 
the cut from U toV-U is just ^^-(A) by the definition of U, which impUes the corollary. n 

So we know that the cuts in H wee never too much larger than the corresponding minimum cut in G, and all 
that remains to show that the quality of H is o{ ^J\ogk) is to show that the cuts in 77 are never too small. We 
conjecture that the quality of H is actually &(log^^^k), and this seems natural since the quality of 77 just restricted 
to the Majority Cut and the sub-cube cuts is actually 0(log^^^^), and often the Boolean functions corresponding 
to these cuts serve as extremal examples in the harmonic analysis of Boolean functions. In fact, our lower bound 
on the quality of any cut-sparsifier for G is based only on analyzing these cuts so in a sense, our lower bound is 
tight given the choice of cuts in G that we used to derive infeasibility in the system of equalities characterizing 
a-quality cut-sparsifiers. 

4.2 A Fourier Theoretic Characterization of Cuts in H 

Here we give a simple formula for the size of a cut in H, given the Fourier representation of the cut. So here we 
consider cuts A c ^ to be Boolean functions of the form : {-1,-1-1}^ ^ {-1,-1-1} s.t. fA{s) = -i-l iff e A. 

Lemmas. h'(A) ^ kf'-^^^^^^ 

Proof: We can again use the infinitesimal characterization for H, in which we choose u~pt and allocate A units 

of capacity from Zs to Zt and repeat until all ^/d units of capacity are spent. 

If we instead choose z, uniformly at random, and then choose u ~pt and allocate A units of capacity from Zs to 

Zt, and repeat this procedure until all units of capacity are spent, then at each step the expected contribution 
to the cut is exactly A ^ ^^pUa(x)] i^g^-^^gg i NSpifAix)] exactly the probability that if we choose t uniformly at 
random, and u~pt that /^(m) ^ fA(t) which means that this edge contributes to the cut. We repeat this procedure 
times, so this implies the lemma. □ 



Lemma 6. /i'(A) = @(kZs /|min(|S|, V^)) 

Proof: Using the setting p = 1 - we can compute h'(A) using the above lemma: 

h'{A)^k^{l-NSp[fAix)]) 



And using Parseval's Theorem, YiS fs - II/II2 - so we can replace 1 with Tisfs ^ above equation and 
this imphes 

Consider the term (1 - (1 - -^f^- For |5 1 < V^, this term is 0(^), and if |5 1 > V^, this term is 0(1). So this 
implies 

h'{A)^@(kY,flrnin{\S\,^/d)) 



4.3 Small Set Expansion of H 

The edge-isoperimetric constant of the Hypercube is 1, but on subsets of the cube that are imbalanced, the 
Hypercube expands more than this. 

Definition 13. For a given set A c we define hal{A) = ^min(|A|,^- as the balance of the set A. 

Given any set A c {-1,+1}'^^^ of balance b - bal(A), the number of edges crossing the cut (A,{-1,+1}'^'^^ - A) 
in the Hypercube is Q(bklog ^). So the Hypercube expands better on small sets, and we will prove a similar 
small set expansion result for the cut-sparsifier H. In fact, for any set A c A" (which we will associated with a 
subset of + and abuse notation), h'(A) > bal(A)kQ(rmn(log^^^, V^)). We will prove this result using 
the Hypercontractive Inequality. 

Lemma 7. h'{A) > ^7a/(A)A;Q(min(log V^)) 

Proof: Assume that |A| < -A| without loss of generality. Throughout just this proof, we will use the 

notation that /a : {-1,+!}'' -> {0,1} aadfAis) = 1 iff j e A. Also we will denote b - bal(A). 

Let 7 << 1 be chose later. Then we will invoke the Hypercontractive inequality with q - 2, p - 2-y, and 




- ~y- Then 



\\f\\p-EAf(xr]'"'-b'fp«b''^^'^^'^^ 

Also ||rp(/(x))||^ = I|7"p(/(^))ll2 - -^'LsP'^^^^fs- So the Hypercontractive Inequality imphes 

s 

And p'^''^' = (1 -y)'"^'. Using Parseval's Theorem, TiS fs - II/II2 ~ ^' we can re- write the above inequality 

as 

b-be-^^Ub-Y,ii-jrfi<b'-y^'^J]f^(l-(l-yfh = ®(Zfs^^^ 

s s s ^ 

This implies 

-(l-e-i^^h<&(Y,fl^^(\Sl-)) 
7 5 ^ 



And as long as < V^, 



^/^rnindSI,-) < pO(/i'(A)) = o(Y,flrmn(\S\, V5)) 
s 7 K ^ 

If |lni < 1, then ^ 1 -OC^lni) which implies 

-(l-e-5'°l)>Q(Z7ln-) 
y ft 

However if In ^ = Q.( V^), then we cannot choose j to be small enough (we must choose ^ < Vd) in order to 
make 5 In j small. 

So the only remaining case is when In j = n( ^fd) . Then notice that the quantity (l-e~5'"l)is increasing with 
decreasing b. So we can lower bound this term by substituting b - e~®^^. If we choose y=-^ then this implies 

i(l-e-ii"i) = n(V^) 

r 

And this in turn implies that 

-(l-e-5'"5) = rj(^,Vj) 

r 

which yields h'{A) > Q.{bk V5). So in either case, h'{A) is lower bounded by either Q.{bkyld) or n(Mln^), as 
desired. 

□ 



4.4 Interpolating Between Cuts via Bourgain's Junta Theorem 

In this section, we show that the quality of the cut-sparsifier H is o( ^j\ogk), thus beating how well the best 
distribution on 0-extensions can approximate cuts in G by a super-constant factor. 

We will first give an outline of how we intend to combine Bourgain's Junta Theorem, and the small set expansion 
of H in order to yield this result. In a previous section, we gave a Fourier theoretic characterization of the cut 
function of H. We will consider an arbitrary cut A c AT and assume for simplicity that \A\ < \K-A\. If the Boolean 
function ^ that corresponds to this cut has significant mass at the tail of the spectrum, this will imply (by our 
Fourier theoretic characterization of the cut function) that the capacity of the corresponding cut in H is oj{k). 
Every cut in G has capacity at most 0{k ^/d) because we can just cut every edge (Zs,ys) for each terminal Zs £ A, 
and each such edge has capacity V^. Then in this case, the ratio of the minimum cut in G to the corresponding cut 
in 77 is o( V5). 

But if the tail of the Fourier spectrum of /a is not significant, and applying Bourgain's Junta Theorem impfies 
that the function /a is close to a junta. Any junta will have a small cut in G (we can take axis cuts corresponding to 
each variable in the junta) and so for any function that is difi'erent from a junta on a vanishing fraction of the inputs, 
we will be able to construct a cut in G (not necessarily minimum) that has capacity o(k yfd). On all balanced cuts 
(i.e. \A\ - @{k)), the capacity of the cut in H will be 0.{k), so again in this case the ratio of the minimum cut in G 
to the corresponding cut in H is o{ ^/d). 

So the only remaining case is when \A\ - o(k), and from the small set expansion of H the capacity of the cut in 
H is (o(\A\) because the cut is imbalanced. Yet the minimum cut in G is again at most \A\ V5, so in this case as well 
the ratio of the minimum cut in G to the corresponding cut in H is o( V^). 



Theorem |2l There is an infinite family of graphs for which the quality of the best cut-sparsifier is ^(is^^^if^ifp) 
better than the best that a distribution on 0-extensions can achieve. 
We repeat Bourgain's Junta Theorem: 

Theorem 6 (Bourgain). STil, [13^ Let f{-\,+\f ^ [-\, + \} be a Boolean function. Then fix any e,S e (0,1/ \0). 
Suppose that 

s 

then for every /3> 0, f is a 

(2^Vi°giMl°glogi/^(A +41/^ lyjunta 

We will choose: 
i-^logJ 

p " 

jr=log''^d 

And also let b = bal{A) = 'p, and remember for simplicity we have assumed that \A\ < \K-A\, so b < j. 
6 = 6'b 

Lemma 8. //Z^Cl - ef^fj < 1-6 then this implies ^5 ^min(|5|, V^) > Q(f ) = Q.{b\og^l^d) 

Proof: The condition ZsCl - e)'"^'.^ > 1 -5 implies <5 < 1 - (1 - e)''^!^ - 0(1^5 ^min(|5|6, 1)) and rearranging 
terms this implies 

- < 0(2]/|min(|5|,-)) = (9(2]/^min(|5|, V^)) 
^ s ^ s 

where the last line follows because ^ - 0{logd) < 0{ V^). □ 

So combining this lemma and Lemma |6l if the conditions of Bourgain's Junta Theorem are not met, then the 
capacity of the cut in the sparsifier is Q.{kblog^^^ d). And of course, the capacity of the minimum cut in G is at 
most kb V^, because for each Zs £ A we could separate A from K-A by cutting the edge {Zs,ys), each of which has 
capacity 



Case 1. If the conditions of Bourgain 's Junta Theorem are not met, then the ratio of the minimum cut in G 



separating A from K — Ato the corresponding cut in H is at most 0{ ^^^ip ). 



But what if the conditions of Bourgain's Junta Theorem are met? We can check what Bourgain's Junta Theorem 
implies for the given choice of parameters. We first consider the case when b is not too small. In particular, for 
our choice of parameters the following 3 inequalities hold: 

> 41/^ (2) 
^blog-^l^d > d^'^logd (3) 

Claim 5. If0 is true, (-^ + 4^/^ ^s) - o[blog^^'^d) 

Claim 6. //(C]) and (0) are true, 2^ Vi^g^7^i^gi°g^(-^ +4^/^ ^Jp) - o(blog-^'^d) 



So when we apply Bourgain's Junta Theorem, if the conditions are met (for our given choice of parameters), 
we get that /a is an {o{blog~^^^ d) ,0{d^''^logd)y}unta. 

Lemma 9. If is a {v,j)-junta, then hic{A) < kvy/d + 

Proof: Let g be a j-junta s.t. Prx[fAix) + g{x)\ < v. Then we can disconnect the set of nodes on the Hypercube 
where g takes a value + 1 from the set of nodes where g takes a value - 1 by performing an axis cut for each variable 
that g depends on. Each such axis cut, cuts | edges in the Hypercube, so the total cost of cutting these edges is 
7'! and then we can alternatively cut the edge {Zs,ys) for any s s.t. /aC^') g{s), and this will be a cut separating A 
from K -A and these extra edges cut are each capacity and we cut at most vk of these edges in total. □ 



So if /a is an[o[blog-^'^d),0(d^''^\ogd))-iunta and © holds, then hK{A) < 6>U 



kb^^d 



Case 2. Suppose the conditions of Bourgain's Junta Theorem are met, and dZ)© cind (13) are true, then the ratio 
of the minimum cut in G separating A from K — Ato the corresponding cut in H is at most 0( ^^^s^ )- 

Proof: Lemma|7]also implies that the edge expansion of H is 0(1), so given a cut |A|, h'(A) > Q(|A|) = Q.{kb). Yet 
under the conditions of this case, the capacity of the cut in G is j and this implies the statement. □ 

So, the only remaining case is when the conditions of Bourgain's Junta Theorem are met at least 1 of the 3 
conditions is not true. Yet we can apply Lemma |7] directly to get that in this case h'{A) = a>(|A|) and of course 
hK{A) < \A\ V^. 



Case 3. Suppose the conditions of Bourgain's Junta Theorem are met, and at least 1 of the 3 inequalities is not 
true, then the ratio of the minimum cut in G separating A from K—A to the corresponding cut in H is at most 

VdlogloglogJ^ 
log^logrf >■ 



Proof: If © is false, log(lM') + log(l/ft) - log(lM) > ^"tiZ'l'f = ^(iSS)" ^ince 1/6' = 0(loglog J), it 



must be the case that log(l/Z7) = Q( i- 



If © is false, b < = 0{d-^'^log^'^ d), and log(l/Z^) = 0(log J). 

If © is false, b < d'^l'^ log^'^ d and log(l/Z?) - Q.{\ogd). 

The minimum of the 3 bounds is the first one. So, \og{\ lb) = ^{^■^^0^'^ if at least 1 of the 3 conditions is 
false. Applying Lemma HI we get that /j'(A) > 0(|A|log i) = 0(|A|i|^|fp). And yet hK{A) < \A\ V^, and this 
implies the statement. Combining the cases, this implies that the quality of H is 0( '^'°g'°g'°g'^ )_ □ 

± -I ^ ^ log log J 



Conjecture 1. The quality ofH as a cut- spar sifier for G is 0(d ' ) 

Theorem |2l There is an infinite family of graphs for which the quality of the best cut-sparsifier is 0(j^^^^|^|p) 
better than the best that a distribution on 0-extensions can achieve. 



5 Improved Constructions via Lifting 



In this section we give a polynomial time construction for a flow-sparsifier that achieves quality at most the 
quality of the best flow-sparsifier that can be realized as a distribution over 0-extensions. Thus we give a con- 
struction for flow-sparsifiers (and thus also cut-sparsifier) that achieve quality O(j^^p). Given that the current 
best upper bounds on the quality of both flow and cut-sparsifiers are achieved as a distribution over 0-extensions, 
the constructive result we present here matches the best known existential bounds on the quality of cut or flow- 
sparsifiers. All previous constructions ifTSl . lITSl need to sacrifice some super-constant factor in order to actually 
construct cut or flow-sparsifiers. We achieve this using a linear program that can be interpreted as a lifting of 
previous linear programs used in constructive results. 

Our technique, we believe, is of independent interest: we perform a lifting on an appropriate linear program. 
This lifting allows us to implicitly enforce a constraint automatically that previously was difficult to enforce, and 
required an approximate separation oracle rather than an exact separation oracle. 

There are known ways for implicitly enforcing this constraint using an exponential number of variables, but 
surprisingly we are able to implicitly enforce this constraint using only polynomially many variables, after just a 
single lifting operation. The lifting operation that we perform is inspired by Earth-mover relaxations, and makes 
it a rare example of when an algorithm is actually able to use the Earth-mover constraints, as opposed to the usual 
use of such constraints in obtaining hardness from integrality gaps. 

Theorem 7. Given a flow sparsifier instance = {G,k), there is a polynomial (in n and k) time algorithm that 
outputs a flow sparsifler H of quality a < a'CH), where a'CH) is the quality of the best flow sparsifier that can be 
realized as a distributions over 0-extensions. 

Proof: We show that the following LP can give a flow-sparsifier with the desired properties: 
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Lemma 10. The value of the LP is a < a'CH). 

Proof: Let 9^ be the best distribution of 0-extensions. We explicitly give a satisfying assignment for all the 
variables : 
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It's easy to see that the graph H formed by is exactly the same as the flow sparsifier obtained from T. So H 
can be routed in G with conjestion at most a' . One can also verify that all the other constraints are satisfied. Thus, 
the value of the LP is at most a'CH). □ 

There are qualitatively two types of constraints that are associated with good flow-sparsifiers H: All flows 
routable in H with congestion at most 1 must be routable in G with congestion at most a. Actually, there is a 
notion of a hardest flow feasible in H to route in G: the flow that saturates all edges in H (i.e. H). So the constraint 
that all flows routable in H with congestion at most 1 be also routable in G with congestion at most a can be 
enforced by ensuring that li can be routed in G with congestion at most a. This constraint can be written using an 
infinite number of linear constraints on H associated with the dual to a maximum concurrent flow problem, and in 
fact an oracle for the maximum concurrent flow problem can serve as a separation oracle for these constraints. 

The second set of constraints associated with good flow-sparsifiers are that all flows routable in G with conges- 
tion at most 1 can also be routed in H with congestion at most 1. This constraint can also be written as an infinite 
number of linear constraints on H, but no polynomial time separation oracle is known for these constraints. In- 
stead, previous work relied on using oblivious routing guarantees to get an approximate separation oracle for this 
problem. 

Intuitively, the constraint that all flows routable in G can be routed in H can be enforced in a number of 
ways. The strategy outlined in the preceding paragraph attempts to incorporate these constraints into the linear 
program. Alternatively, one could enforce that H be realized as a distribution over 0-extensions G /. This would 
automatically enforce that all flows routable in G would also be routable in H. However, this would require a 
variable for each 0-extension G/, and there would be exponentially many such variables. 

Yet the above linear programming formulation is a hybrid between these two approaches. In previous linear 
programming formulations, the sparsifier H was not required to be explicitly generated from G, hence the need 
to enforce that it actually be a flow-sparsifer. When there is a variable for each 0-extension, then H is forced to 
be generated from G and this constraint is implicitly satisfied. Yet just enforcing the Earth-mover constraints, as 
above, actually forces H to have enough structure inherited from G that H is automatically a flow-sparsifier! This 
is the reason that we are able to get improved constructive results. To re-iterate, a simple lifting (corresponding 
to the Earth-mover constraints) does actually impose enough structure on H, that we can implicitly impose the 
constraint that i7 be a flow-sparsifier without using exponentially many variables for each 0-extension G/! 

Lemma 11. jw/j- : i,j e K,i < jj is a flow sparsifler of quality a. 

Proof: Let H be the capacitated graph on K formed by {wi^j. The LP system guarantees that H can be routed in 
G with conjestion at most a, and thus we only need to show the other direction: every multi-commodity flow in G 
with end points in K can be routed in H with conjestion at most 1. 

Consider a multi-commoditiy flow jyjj- : i, j e K,i< jj that can be routed in G. By the LP duality, we have 

^c(m,v)<5(m,v) > Y^fij6{i,j) 

u<v i< j 

for every metric 6 over V. 

Let 6' be any metric over K, then 

J]6'(i,j)wij = Y,S'ii,j)Y,c(u,v)x';j = Y,c{u,v)Y,x1jS'{i,i) > Y,c{u,v)EMD6'{x\x'). 

i<j i<j u^v u<v ii^i u<v 



Define 6{u,v) = EMDs'{x^,x^). Clearly, 5 is ametric over V and 5(i,j) = S'{i,j) for every i,j e K. We have 



'^S'iiJJwij >^c{u,v)S{u,v) > ^fij6{i,j) = ^fi j6'{i,j) 

i<j IKV i<j i<j 

We have proved that T,i<j^'iiJ)wij > T,i<jfi,j^'iiJ) for every metric 6' over K. By the LP duahty, / can be 
routed in H with conjestion 1. □ 

Lemma 12. The LP can be solved in polynomial (in n and k) time. 

Proof: The LP contains polynomial number of variables and hence it is sufficient to give a separation oracle 
between a given point and the polytope defined by the LP All constraints except whether or not congdH) < a can 
be directly checked, and for this remaining constraint the exact separation oracle is given by solving a maximum 
concurrent flow problem. □ □ 



6 Abstract Integrality Gaps and Rounding Algorithms 

In this section, we give a generalization of the hierarchical decompositions constructed in EH- This immedi- 
ately yields an C?(log^)-competitive Steiner oblivious routing scheme, which is optimal. Also, from our hierarchi- 
cal decompositions we can recover the 0(log k) bound on the flow-cut gap for maximum concurrent flows given in 
|[17| and [1]. Additionally, we can also give an 0{logk) flow-cut gap for the maximum multiflow problem, which 
was originally given in [10] . This even yields an C?(log^) flow-cut gap for the relaxation for the requirement cut 
problem, which is given in ifTOl . In fact, we will be able to give an abstract framework to which the results in 
this section apply (and yield 0(logk) flow-cut gaps for), and in this sense we are able to help explain the intrinsic 
robustness of the worst-case ratio between integral cover compared to fractional packing problems in graphs. 

Philosophically, this section aims to answer the question: Do we really need to pay a price in the approximation 
guarantee for reducing to a graph on size k? In fact, as we will see, there is often a way to combine both the 
reduction to a graph on size k and the rounding needed to actually obtain a flow-cut gap on the reduced graph, into 
one step! This is exactly the observation that leads to our improved approximation guarantee for Steiner oblivious 
routing. 

6.1 0-Decomposition 

We extend the notion of 0-extensions, which we previously defined, to a notion of 0-decompositions. Intuitively, 
we would hke to combine the notion of a 0-extension with that of a decomposition tree. 

Again, given a 0-extension /, we will denote G/ as the graph on K that results from contracting all sets of nodes 
mapped to any single terminal. Then we will use c / to denote the capacity function of this graph. 

Definition 14. Given a tree T on K, and a 0-extension f, we can generate a 0-decomposition Gfj - {K,Efj) as 
follows: 

The only edges present in Gfj will be those in T, and for any edge {a,b) € E(T), let Ta,Th be the subtrees 
containing a,b respectively that result from deleting (a, b) from T. 

Then Cfj(a,b) (i.e. the capacity assigned to {a,b) in Gfj is: Cfj{a,b) = ^^^j.^ ^,^j^c f{u,v). 

Let A denote the set of 0-extensions, and let 11 denote the set of trees on K. 

Claim 7. For any distribution j on Kxli, and for any demand d € congnid) < congdd) where H - 

I.feAJen7(f'T)Gfj 



Proof: Clearly for all /, T, y(f, T)3 is feasible in y{f, T)G / (because contracting edges only makes routing flow 
easier), and so because G/^r is a hierarchical decomposition tree for G/, then it follows that y{f, T)d is also feasible 
inG/,r. □ 



Claim 8. Given any distribution y on Kxli, let H = Yj fehjeajif^ T)Gfj. Then sup^^^d) = congQ(lt) 

Theorem 8. There is a polynomial time algorithm to construct a distribution y on AxU such that congdH) = 
Oilogk) where H = Y, /eA,renr(/. T)Gfj. 

We want to show that there is a distribution y on Axil such that congdH) = O(log^). This will yield a 
generalization of |[2T1l . So as in lITSl . we set up a zero-sum game in which the first player chooses f,T and plays 
Gfj. The second player then chooses some metric space d : KxK ^ %^ s.t. there is some extension of to a 
metric space on V s.t. Yj(u,v)eEdiu,v)c{u,v) < 1. Then the first player loses Yj(aj:>)C f,T{a,b)d{a,b), which we will 
refer to as the cost of the metric space d against Gfj. 

It follows immediately from ifTSll or ifTSl that a bound of C?(log k) on the game value will imply our desired 
structural result. 

Theorem 9. The game value v is OQogk) 
Proof: 

We consider an arbitrary strategy A for the second player, which is a distribution on metric spaces d that can be 
realized in G with distance x capacity units at most 1. In fact, if we take the average metric space A = Yjd^(A)d, 
then this metric space can also be realized in G with at most 1 unit of distance x capacity. 

So we can bound the game value by showing that for all metric spaces A that can be realized with distance x 
capacity units at most 1, there is a 0-decomposition Gfj for which the cost against A is at most O(logfc). 

We can prove this by a randomized rounding procedure that is almost the same as the rounding procedure in 
(91 : Scaling up the metric space, we can assume that all distances in the extension of A to a metric space on V 
have distance at least 1, and we assume 2^ is an upper bound on the diameter of the metric space. Then we need 
to first choose a 0-extension / for which the cost against A is Oilogk) times the cost of realizing A in G. We do 
this as follows: 

Choose a random permutation n{\),n{2), ...,n{k) of K 
Choose /3 uniformly at random from [1,2] 
Ds^{V],i^S-l 

while D,+i has a cluster which contains more than one terminal do 
A- ^ 2-1/3 
for e^ltok do 

for every cluster S in D;+i do 

Create a new cluster of all unassigned vertices in S closer than y6,- to n{() 
end for 
end for 

end while 

Then, exactly as in l'9'l, we can construct a decomposition tree from the rounding procedure. The root node is V 
corresponding to the partition Dg, and the children of this node are all sets in the partition Dg^y. Each successive 
D, is a refinement of so each set of D, is made to be a child of the corresponding set in D,+i that contains it. 
At each level / of this tree, the distance to the layer above is 2', and one can verify that this tree-metric associated 



with the decomposition tree dominates the original metric space A restricted to the set K. Note that the tree metric 
does not dominate A on V, because there are some nodes which are mapped to the same leaf node in this tree, and 
correspondingly have distance 0. 

If we consider any edge (m,v), we can bound the expected distance in this tree metric from the leaf node 
containing u to the leaf-node containing v. In fact, this expected distance is only a function of the metric space A 
restricted to K\J{u,v}. Accordingly, for any (m,v), we can regard the metric space that generates the tree-metric as 
a metric space on just k + 2 points. 

Formally, the rounding procedure in |[9]| is: 

Choose a random permutation 7r(l),7r(2), ...,n{n) of V 
Choose p uniformly at random from [1,2] 
Ds^{V\,i^S-\ 

while Di+i has a cluster that is not a singleton do 



for i -ItonAo 

for every cluster S in D,+i do 

Create a new cluster of all unassigned vertices in S closer than yS, to n{{) 

end for 
end for 



Formally, 191 proves a stronger statement than just that the expected distance (according to the tree-metric 
generated from the above rounding procedure) is OQogn) times the original distance. We will say that u,v are split 
at level / if these two nodes are contained in different sets of D,. Let X, be the indicator variable for this event. 

Then the distance in the tree-metric Ay generated from the above rounding procedure is A7-(m, v) - li,-2'''"^X,-. In 
fact, 121 proves the stronger statement that this is true even if u,v are not in the metric space (i.e. u,v ^V) but are 
always grouped in the cluster which they would be if they were in fact in the set V (provided of course that they 
can be grouped in such a cluster). More formally, we set V = V U {u,v} and if the step "Create a new cluster of all 
unassigned vertices in S closer than to n{{)" is replaced with "Create a new cluster of all unassigned vertices in 
V in S closer than to n{{)" , then |[9l actually proves in this more general context that 



When we input the metric space A restricted to K into the above rounding procedure (but at each clustering 
stage we consider all of V) then we get exactly our rounding procedure. So then the main theorem in fOl (or rather 
our restatement of it) is 

(If Aj- is the tree-metric generated from the above rounding procedure) 

Theorem 10. [9] For all u,v, £'[Ar(M,v)] < 0{\ogk)^{u,v). 

So at the end of the rounding procedure, we have a tree in which each leaf correspond to a subset of V that 
contains at most 1 terminal. We are given a tree-metric A7 on V associated with the output of the algorithm, and 
this tree-metric has the property that Zi(„^v)g£c(M,v)A7-(M, v) < C?(log^). 

We would like to construct a tree T' from T which has only leafs which contain exactly one terminal. We first 
state a simple claim that will be instructive in order to do this: 



/ «— / - 1 
end while 




Claim 9. Given a tree metric Aj- on a tree T on K, cost{Gfj,/\j) = cost{G f,/\j). 



Proof: The graph Gfj can be obtained from Gf by iteratively re-routing some edge {a,b) e Ef along the path 
connecting a and binT and adding Cf{a,b) capacity to each edge on this path, and finally deleting the edge {a,b). 
The original cost of this edge is c{a,b)ATia,b), and if a = P\,P2, ■■■,Pr -bi?, the path connecting a and b in T , the 
cost after performing this operation is c{a,b)Y/jZl l^T{Pi,Pi+\) - c{a,b)AT{a,b) because At is a tree-metric. □ 

We can think of each edge (m,v) as being routed between the deepest nodes in the tree that contain u and v 
respectively, and the edge pays c{u, v) times the distance according to the tree-metric on this path. Then we can 
perform the following procedure: each time we find a node in the tree which has only leaf nodes as children and 
none of these leaf nodes contains a terminal, we can delete these leaf nodes. This cannot increase the cost of the 
edges against the tree-metric because every edge (which we regard as routed in the tree) is routed on the same, 
or a shorter path. After this procedure is done, every leaf node that doesn't contain a terminal contains a parent 
p that has a terminal node a. Suppose that the deepest node in the tree that contains a is c We can take this leaf 
node, and delete it, and place all nodes in the tree-node c. This procedure only affects the cost of edges with one 
endpoint in the leaf node that we deleted, and at most doubles th e cost paid by the edge because distances in 
the tree are geometrically decreasing. So if we iteratively perform the above steps, the total cost after performing 
these operations is at most 4 times the original cost. 

And it is easy to see that this results in a natural 0-extension in which each node u is mapped to the terminal 
coiTcsponding to the deepest node that u is contained in. 

Each edge pays a cost proportional to a tree-metric distance between the endpoints of the edge. So we know 
that cost{G f,AT) = C?(log^) because the cost increased by at most a factor of 4 from iteratively performing the 
above steps. Yet using the above Claim, we get a 0-extension / and a tree T such that cost{G fj,AT) = 0(\ogk) 
and because Aj dominates A when restricted to K, this implies that cost(Gfj,A) < cost{Gfj,AT) - O(log^) and 
this implies the bound on the game value. 

□ 

In turn, using the arguments in |fT5l, implies: 

Theorem 11. There is a distribution y on Ax IT such that congdH) = O(log^) where H -Yj feK,Teay{f^'^)Gfj- 

Also, using the arguments in 121] (because each Gfj is a tree and hence has a unique routing scheme), this 
gives us an 0(log ^)-competitive Steiner oblivious routing scheme: 

Corollary 4, Given G - {V,E) and K cV, there is a set of unit flows for all a,b e K that sends a unit flow from 
a to b, such that given any demand restricted to K, 3, the congestion incurred by this oblivious routing scheme is 
0{logk) times the minimum congestion routing ofd. 

Actually, the above theorem can be made constructive directly using the techniques in ||2TI . which build on ll22l . 
We will not repeat the proof, instead we note the only minor difference in the proof. 

Definition 15. Let 'R denote the set of pairs {Gfj,g) where Gfj is a 0-decomposition of G, and g is a function 
from edges in Gfj to paths in g so that an edge {a,b) in Gfj is mapped to a path connecting a and b in G. 

Given a metric space 6 on V, we can define the notion of the cost of a {Gfj,g) against 6: 

Definition 16. 

cost{{Gfj,g),6) ^ ^ ^ Cfj{a,b)d(u,v) = ^ Cfj{a,b)6{g{a,b)) 

{a,h)eE(Gfj) (u,v)eg(a,b) (a,b)eE(Gfj) 

Corollary 5. For any metric 5 on V, there is some {Gf j,g) e 'R such that: 

cost{{Gfj,g),6) < 0{\ogk)Y^c{u,v)5{u,v) 

(u,v) 



Proof: We can apply Theorem [TT] which impUes that there is a distribution fion'R s.t. for all edges e € E, 

E(Gfj,g)^til ^ Cfj{a,b)] < 0{logk)c{e) 

(a,h)eE(Gfj) S.t. e3g(a,b) 

because we can take the optimal routing of H = YjfeKjeYiy{f'T)G fj in G, which requires congestion at most 
O(log^) and if we compute a path decomposition of the routing schemes of each Gfj in the support of y, we can 
use these to express the routing scheme as a convex combination of pairs from H. □ 

So we can use an identical proof as in [21] to actually construct a distribution y on 0-decompositions s.t. for 
H = Yj /eA,renT(/' T)Gfj we have congQ{ll) = O(log^). All we need to modify is the actual packing problem. In 
II2TI . the goal of the packing problem is to pack a convex combination of decomposition trees into the graph G 
s.t. the expected relative load on any edge is at most C?(log?i). Here our goal is to pack a convex combination of 
0-decompositions into G. So instead of writing a packing problem over decomposition trees, we write a packing 
problem over pairs {Gfj,g) € H and the goal is to find a convex combination of these pairs s.t. the relative load on 
any edge is O(log^). 

lUl find a polynomial time algorithm by relating the change (when a decomposition tree is added to the convex 
combination) of the worst-case relative load (actually a convex function that dominates this maximum) to the 
cost of a decomposition tree against a metric. Analogously, as long as we can always (for any metric space 6 
on V) find a pair {Gfj,g) as in Corollary [5] an identical proof as in ll2T1l will give us a constructive version of 
Theorem [TT] And we can do this by again using the Theorem due to fSl (which we restated above in a more 
convenient notation for our purposes). This will give us a 0-decomposition G fj for which J^iaM ^ fji'^^ b)S{a, b) < 
0(\ogk)Yj{u,v)C{u,v)6{u,v) and we still need to choose a routing of Gfj in G. We can do this in a easy way: for 
each edge {a,b) in Gfj, just choose the shortest path according to 6 connecting a and b in G. The length of this 
path will be 6{a,b), and so we have that cost{{G fj,g),6) < C?(logA;)X(m,)C(M,v)5(M,v) as desired. Then using the 
proof in 1,21,1 in our context, this immediately yields Theorem [8] 

6.2 Applications 

Also, as we noted, this gives us an alternate proof of the main results in ifTTll . |[T1 and lITOll . We first give an 
abstract framework into which these problems all fit: 

Definition [IJ We call a fractional packing problem P a graph packing problem if the goal of the dual covering 
problem D is to minimize the ratio of the total units of distance x capacity allocated in the graph divided by some 
monotone increasing function of the distances between terminals. 

Let ID denote the integral dual graph covering problem. To make this definition seem more natural, we demon- 
strate that a number of well-studied problems fit into this framework. 

Example 1. I\17\l . IHTjl P: maximum concurrent flow ; ID: generalized sparsest cut 

Here we are given some demand vector / € , and the goal is to maximize the value r such that rf is feasible 
in G. Then the dual to this problem corresponds to minimizing the total distance x capacity units, divided by 
Yj(a,h)fa,hd{a,b), whcrc d is the induced semi-metric on K. The function in the denominator is clearly a monotone 
increasing function of the distances between pairs of terminals, and hence is an example of what we call a graph 
packing problem. The generalized sparsest cut problem corresponds to the "integral" constraint on the dual, that 
the distance function be a cut metric. 



Example 2. / liOl/ P: maximum multiflow; ID: multicut 



Here we are given some pairs of terminals T c y^)^ ^iid the goal is to find a flow / that can be routed in G 

that maximizes '^ia,h)eT fa,b- The dual to this problem corresponds to minimizing the total distance x capacity 
units divided by mm(^^, i,)eT{dia,b)}, again where where d is the induced semi-metric on K. Also the function in 
the denominator is again a monotone increasing function of the distances between pairs of terminals, and hence is 
another an example of what we call a graph packing problem. The multicut problem corresponds to the "integral" 
constraint on the dual that the distance function be a partition metric. 

Example 3. ID: Steiner multi-cut 

Example 4. ID: Steiner minimum-bisection 

Example 5. / Ii9l/ P: multicast routing; ID: requirement cut 

This is another partitioning problem, and the input is again a set of subsets {/?,),. Each subset /?, is also given 
at requirement r,, and the goal is to minimize the total capacity removed from G, in order to ensure that each 
subset Ri is contained in at least r, different components. Similarly to the Steiner multi-cut problem, the standard 
relaxation for this problem is to minimize the total amount of distance x capacity units allocated in G, s.t. for each 
/ the minimum spanning tree (on the induced metric on K) on every subset /?, has total distance at least r,. Let 
n, be the set of spanning trees on the subset Then we can again cast this relaxation in the above framework 
because the goal is to minimize the total distance x capacity units divided by min,{ ^(^J'^'^t '^(aM ^ rj,^^ ^^^^ 
to this fractional covering problem is actually a common encoding of multicast routing problems, and so these 
problems as well are examples of graph packing problems. Here the requirement cut problem corresponds to the 
"integral" constraint that the distance function be a partition metric. 

In fact, one could imagine many other examples of interesting problems that fit into this framework. One 
can regard maximum multiflow as an unrooted problem of packing an edge fractionally into a graph G, and the 
maximum concurrent flow problem is a rooted graph packing problem where we are given a fixed graph on the 
terminals (corresponding to the demand) and the goal is to pack as many copies as we can into G (i.e. maximizing 
throughput). The dual to the Steiner multi-cut is more interesting, and is actually a combination of rooted and 
unrooted problems where we are given subset /?, of terminals, and the goal is to maximize the total spanning trees 
over the sets Ri that we pack into G. This is a combination of a unrooted (each spanning tree on any set Rj counts 
the same) and a rooted problem (once we fix the Ri, we need a spanning tree on these terminals). 

Then any other flow-problem that is combinatorially restricted can also be seen to fit into this framework. 

As an application of our theorem in the previous section, we demonstrate that all graph packing problems can 
be reduced to graph packing problems on trees at the loss of an C?(log^). So whenever we are given a bound on 
the ratio of the integral covering problem to the fractional packing problem on trees of say C, this immediately 
translates to an 0{C logic) bound in general graphs. So in some sense, these embeddings into distributions on 
0-decompositions helps explain the intrinsic robustness of graph packing problems, and why the integrality gap 
always seems to be C?(log^). In fact, since we can actually construct these distributions on 0-decompositions, we 
obtain an Abstract Rounding Algorithm that works for general graph packing problems. 

Theorem m There is a polynomial time algorithm to construct a distribution /u on (a polynomial number of) trees 
on the terminal set K, s.t. 

Et^;,[OPT{P,T)] < 0{\ogk)OPT{P,G) 

and such that any valid integral dual of cost C (for any tree T in the support of ji) can be immediately transformed 
into a valid integral dual in G of cost at most C. 

We first demonstrate that the operations we need to construct a 0-decomposition only make the dual to a graph 
packing problem more difficult: Let v{G,K) be the optimal value of a dual to a graph packing problem on G = 
{V,E),K(iV. 



Claim 10. Replacing any edge (u,v) of capacity c(u,v) with a path u = pi,p2,-.,Pr = v, deleting the edge (m,v) 
and adding c{u,v) units of capacity along the path does not decrease the optimal value of the dual. 



Proof: We can scale the distance function of the optimal dual so that the monotone increasing function of the 
distances between terminals is exactly 1. Then the value of the dual is exactly the total capacity x distance units 
allocated. If we maintain the same metric space on the vertex set V, then the monotone increasing function of 
terminal distances is still exactly 1 after replacing the edge (m,v) by the path u - p\,p2,...,Pr - v. However this 
replacement does change the cost (in terms of the total distance X capacity units). Deleting the edge reduces the 
cost by c{u,v)d{u,v), and augmenting along the path increases the cost by c{u,v)Y/jZld{pi,pi+i) which, using the 
triangle inequality, is at least c{u,v)d{u,v). □ 

Claim 11. Suppose we join two nodes u,v (s.t. not both ofu,v are terminals) into a new node u', and replace each 
edge into uorv with a corresponding edge of the same capacity into u'. Then the optimal value of the dual does 
not decrease. 

Proof: We can equivalently regard this operation as placing an edge of infinite capacity connecting u and v, and 
this operation clearly does not change the set of distance functions for which the monotone increasing function of 
the terminal distances is at least 1 . And so this operation can only increase the cost of the optimal dual solution. □ 

We can obtain any 0-decomposition G fj from some combination of these operations. So we get that for any 



Lemma 13. %,r)<-r[v(G/,r,*r)] < 0{\ogk)v{G,K). 

Proof: We know that there is a metric doaV s.t. 'Tj(u,v)C{u,v)d{u,v) - v{G,K) and that the monotone increasing 
function of d (restricted to K) is at least 1. 

We also know that there is a simultaneous routing of each y(/, T)G fj in G so that the congestion on any edge 
in G is O(logfc). Then consider the routing of one such y{f,T)Gfj in this simultaneous routing. Each edge 
{a,b) e Efj is routed to some distribution on paths connecting a and ^7 in G. In total y(f,T)cf(a,b) flow is routed 
on some distribution on paths, and consider a path p that carries C(p) total flow from a to ^ in the routing of 
'y(f,T)Gfj. If the total distance along this path is d(p), we increment the distance dfj on the edge (a,b) in G/j 
by y(/r^c/ b) ' ^'^^ ^^^^ paths. We do this also for each {a,b) in G/j. 

If dfj is the resulting semi-metric on Gfj, then this distance function dominates d restricted to K, because the 
distance that we allocate to the edge (a,b) in G/j is a convex combination of the distances along paths coimecting 
a and b in G, each of which is at least d{a,h). 

So if we perform the above distance allocation for each G/j, then each resulting dfj,Gfj pair satisfies the 
condition that the monotone increasing function of terminal distances {dfj) is at least I. But how much distance 
X capacity units have we allocated in expectation? 



f,T: 



Corollary 6. v{Gfj,K)>v{G,K) 



Let y be the distribution on A x 11 s.t. H -Yj f^Ajeu 



yif,T)Gfj and congcifi) < 0{\ogk). 



E(fj)^y[v{Gf,T,K)\<Y,y{f,T) Yj Cf,T{a,b)df,T{a,b) 



f,T (a,b)€Efj 



We can re- write 




□ 



And this implies: 

Theorem |3l For any graph packing problem P, the maximum ratio of the integral dual to the fractional primal is 
at most 0(log k) times the maximum ratio restricted to trees. 

And since we can actually construct such a distribution on 0-decompositions in polynomial time, using The- 
orem [H this actually gives us an Abstract Rounding Algorithm: We can just construct such a distribution on 
0-decompositions, sample one at random, apply a rounding algorithm to the tree to obtain a integral dual on the 
0-decomposition Gfj within 0(\ogk)C times the value of the primal packing problem on G. This integral dual on 
the 0-decomposition Gfj can then be easily mapped back to an integral dual on G at no additional cost precisely 
because we can set the distance in G of any edge (a, b) to be the tree-distance according to the integral dual on 
Gfj between a and b. Using Claim|9l this implies that the cost of the dual in Gf is equal the cost of the dual in 
Gfj. And we can choose an integral dual 6' in G in which for all u,v, 6'{u,v) = 6{f{u),f{v)) and the cost of this 
dual 6' on G is exactly the cost of Gj on 6. And so we have an integral dual solution in G of cost at most C?(log k)C 
times the cost of the fractional primal packing value in G, where C is the maximum integrality gap of the graph 
packing problem restricted to trees. This yields our Abstract Rounding Algorithm: 

Theorem 131 There is a polynomial time algorithm to construct a distribution /i on (a polynomial number of) trees 
on the terminal set K, s.t. 

Et^^[OPT{P,T)] < 0{\ogk)OPT{P,G) 

and such that any valid integral dual of cost C (for any tree T in the support of jj) can be immediately transformed 
into a valid integral dual in G of cost at most C. 

Corollary 7. If there is a C -approximation algorithm for a graph partitioning problem restricted to trees, then 
there is an 0{C\ogk) approximation algorithm for the graph partitioning problem in general graphs. 

So, there is a natural, generic algorithm associated with this theorem : 
1: Decompose G into an C?(log A;)-oblivious distribution p of 0-decompostion trees; 
2: Randomly select a tree G/^ from the distribution p; 

3: Solve the problem on the tree G/^r> let 5 be the metric the algorithm output; 
4: Return {6,f). 

For example, this gives a generic algorithm that achieves an 0(yogk) guarantee for both generalized sparsest cut 
and multicut. The previous techniques for rounding a fractional solution to generalized sparsest cut ifTTl . HI rely 
on metric embedding results, and the techniques for rounding fractional solutions to multicut ifTOl rely on purely 
combinatorial, region-growing arguments. Yet, through this theorem, we can give a unified rounding algorithm 
that achieves an 0(\ogk) guarantee for both of these problems, and more generally for graph packing problems 
(whenever the integrality gap restricted to trees is a constant). 
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A Harmonic Analysis 

We consider the group - + equipped with the group operation sot = [si *ti,S2*t2, ■■■Sd*td] £ F^- Any 
subset S c [d] defines a character ^5 = Ylies '■ ^2 + See 1120) for an introduction to the harmonic 

analysis of Boolean functions. 

Then any function /: {-!, + ! j'' —> 51 can be written as: 

f(x) = Y,fsXs(x) 
s 

Fact 1. For any S ,T c [d] s.t. S t T, ^vDfsC-^krW] = 

/ \i/p 

For any > 0, we will denote the p-norm of / as = [Ex[f{xy]] . Then 
Theorem 12 (Parseval). 

Y,fl = EAf{x)f] = \\f\\l 
s 

Definition 17. Given - 1 < p < 1, Let y~pX denote choosing y depending on x s. t. for each coordinate i, E[yjXi] - p. 

Definition 18. Given - 1 < p < 1, the operator Tp maps functions on the Boolean cube to functions on the Boolean 
cube, andforf: ^ %, Tp{f{x)) = Ey^^Af(y)l 

Fact 2. Tp(xsix))=Xsix)p^'^ 

In fact, because Tp is a linear operator on functions, we can use the Fourier representation of a function / to 
easily write the effect of applying the operator Tp to the function /: 

Corollary 8. Tp{fix)) = Y^s P^'^fsXsix) 

Definition 19. The Noise Stability of a function f is NSp{f) = E^v- x[f{x)f(y)] 



Facts. NSpif) = j:sP^'^f^ 



Theorem 13 (Hypercontractivity). Ml/ iEI/ For any q>p> I, for any p < 

WTpfW, < ll/llp 

A statement of this theorem is given in |[20l and Q for example. 

Definition 20. A function g : ^ % is a j -junta if there is a set S c [d] s.t. \S\< j and g depends only on 

d 
2 



variables in S - i.e. for any x,y e F^ s.t. ^iesXi - yi we have g{x) = g(y). We will call a function f an {e,j)-junta if 



there is a function ^:{-l, + l)^— that is a j -junta and Prx[f{x) g(x)\ < e. 

We will use a quantitative version of Bourgain's Junta Theorem 115] that is given by Khot and Naor in |[T3l : 

Theorem 14 (Bourgain). ^JJS Let f{-\, + \Y ^ {-I, + 1} be a Boolean function. Then fix any €,6 € (0,1/^0). 
Suppose that 

s 

then for every /3 > 0, f is a 

(2^Vi°giMi°giogi/^(_^ +41/^ lyjunta 

This theorem is often described as mysterious, or deep, and has lead to some breakthrough results in theoretical 
computer science ifTSl . lfT4l and is also quite subtle. For example, this theorem crucially relies on the property that 
/ is a Boolean function, and in more general cases only much weaker bounds are known Q. 



